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WITH  EXPONENTIAL  PERFORMANCE  CRITERIA  AND 
THEIR  RELATION  TO  DETERMINISTIC  DIFFERENTIAL  GAMES 

By 

D.  H.:  Jacobson 

Division  of  Engineering  and  Applied  Physics 
Harvard  University,  Cambridge,  Massachusetts 


ABSTRACT 

In  this  report  two  stochastic  optimal  control  problems  are  solved 
whose  performance  criteria  are  the  expected  values  of  exponential 
functions  of  quadratic  forms.  The  optimal  controller  is  linear  in  both 
cases  but  depends  upon  the  covariance  matrix  of  the  additive  process 
noise  so  that  the  Certainty  Equivalence  Principle  does  not  hold.  The 
controllers  are  shown  to  be  equivalent  to  those  obtained  by  solving  a 
cooperative  and  a  noncooperative  quadratic  (differential)  game,  and  this 
leads  to  some  interesting  interpretations  and  observations.. 

Finally,  some  stability  properties  of  the  asymptotic  controllers 


are  discussed. 
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1.  Introduction 

The  so  called  LQG  problem*  of  optimal  stochastic  control  Til 
oossesses  a  number  of  interesting  features.  First,  the  optimal  feedback 
controller  is  a  linear  (time  varying)  function  of  the  state  variables. 
Second,  this  linear  controller  is  identical  to  that  which  is  obtained 
by  neglecting  the  additive  gaussian  noise  and  solving  the  resultant 
deterministic  LQP**  (Certainty  Equivalence  Principle) .  Thus  the  con¬ 
troller  for  the  stochastic  svstem  is  independent  of  the  statistics  of 
the  additive  noise.  This  is  annealing  for  small  noise  intensitv,  but 
for  large  noise  (large  covariance)  one  has  the  intuitive  feeling  that 
perhaps  a  different  controller  would  be  more  appropriate. 

In  this  naper  we  consider  optimal  control  of  linear  svstems  disturbed 
bv  additive  gaussian  noise,  whose  associated  performance  criteria  are  the 
expected  values  of  exponential  functions  of  negative  semi-definite  and 
positive  semi-definite  quadratic  forms.  We  shall  refer  to  the  former 
case  as  the  LETS  problem  and  the  latter  as  the  LET^O  problem  and  to  their 
deterministic  counterparts  as  LF.  P  and  LE+P  respect ivelv.  In  the  deter¬ 
ministic  cases,  LE*?,  the  solutions  are  identical  to  that  for  the  LOP 

(the  natural  logarithm  of  the  exponential  performance  criteria  vield 

+ 

Quadratic  forms).  However,  when  noise  is  present,  LF“0  problems,  the 


Problem  with  linear  dynamics  disturbed  by  additive  gaussian  noise, 
together  with  a  performance  criterionwhich  is  the  expected  value 
of  a  positive  semi-definite  quadratic  form. 

Same  as  LQO  problem  but  with  noise  set  to  zero. 


\ 


/ 
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ontimal  controllers  are  different  from  that  of  the  LQG  problem.  In 

particular,  though  as  In  the  case  of  the  LQG  problem  these  are  linear 

functions  of  the  state  variables,  they  depend  explicitly  upon  the 

covariance  matrices  of  the  additive  gaussian  noise.  For  small  noise 

+ 

lntensitv  (small  covariance)  the  solutions  of  the  LE  G  and  LOG  problems 
are  close,  but  for  large  noise  Intensity  there  -da  a  marked  difference. 

In  particular,  as  the  noise  intensity  tends  to  infinity  the  optimal 
gains  for  the  LE  G  problem  tend  to  zero;  intuitively  this  implies  that 
if  the  random  input  is  "very  wild"  little  can  be  gained  (in  the  sense  of 
reducing  the  value  of  this  particular  performance  criterion)  bv  con¬ 
trolling  the  system.  In  the  LE+G  problem  the  optimal  controller  ceases 
to  exist  if  the  noise  intensity  is  sufficiently  large  (that  is,  the 
performance  criterion  becomes  infinite,  regardless  of  the  control  input). 

These  new  controllers,  which  retain  the  simplicity  of  the  solution 
of  the  LQG  problem,  could  prove  to  be  attractive  in  certain  applications. 

In  addition  to  formulating  and  solving  the  LE  G  problems  we  demonstrate 
that  their  solutions  are  equivalent  to  the  solutions  of  cooperative  and 
noncooperative  linear-quadratic  zero-sum  (differential)  games.  These 
equivalences  provide  interpretations  for  the  stochast-*''  controllers  in 
terms  of  solutions  of  deterministic  zero-sum  games,  and  v.^e  versa.  It 
is  hoped  that  these  equivalences  will  aid  in  the  quest  for  new  formulations 
and  (proofs  of  existence  of)  solutions  of  stochastic  nonlinear  systems  and; 
nonlinear  differential  games. 

+ 

We  investigate  briefly  the  infinite  time  version  of  the  LE  G  problems 
and  point  out  that  the  steady  state  optimal  controller  for  the  LE  G  problem 
is  not  necessarily  stable.  On  the  other  hand  the  steady  state  optimal 
controller  for  the  LE+G  problem,  if  it  exists,  is  stable.  Thus  the 
LE+G  formulation  may  be  preferable  in  the  infinite  time  case. 
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2.  Formulation  of  Discrete  Time  LE*G  Problems 
2.1  The  LE~fi  Problem 
a)  Dynamics 

Ve  shall  consider  a  linear  discrete  tine  dynamic  system 
described  bv 


Vi  ■  WVk*rA  :  k-n . "-1-  x„  'l7CT' 


a) 


where  the  "state"  vector  £  Rn,  the  control  vector  e  R™  and  the 
gaussian  noise  innut  e  Ra.  The  matrices  A^,  B^,  have  anpropriate 
dimensions  and  depend  upon  the  time  k. 
b)  Noise 

The  noise  input  is  a  sequence  {o^}  of  independently 
distributed  Raussian  random  variables  havinR  probability  density 


N-l 

. Vi>  ■  A  ®«Vk> 

k-n 


(2) 


where  pa  :  Rq  *  RN  -*•  R+  and  p  :  R°  x  I+  R+  is  given  bv 

*<vk>  •  J===  erp  f-  1  “kVk1 
vwip;1! 

with 

P^  >  0  (positive-definite)  ;  k-0,...,N-l 

Note  that 

<£[ak]  *  0,  »  P^  ;  k-n,..., n-1 


(3) 


(4) 


(5) 


where  &  denotes  expectation. 
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c)  Performance  Criterion 

The  performance  of  the  stochastic  linear  svstem  is  measured 
bv  the  criterion 

v"(xQ)  -  -  «?,  n  p;(xk;k)p  juk;k)p;(x^;N)  (6) 

o  k*0 

where  V-  :  Rn  ->  [-1,0]  and  tf  :  Rn  x  i+  -►  fn,l],  p~  :  Rm  x  i+  +  f o,l] 


are  given 

Ux(xk:k^  =  eXT>  T  xk\Xk*  ’  k*0 , . . .  ,N  (7) 

Wu(uk;k)  =  exp  (-  \  ukVV  5  k-0 . N-l  (8) 

ar.d 

Qk  >■  0  (positive  semi-definite)  ;  k*o,...,N  (9) 

Rk  >  0  (positive  definite)  ;  k«0,...,N-l  (10) 

Note  that  (6)  can  be  written  as 

N-l 

V‘(x0>  *  -  eW  “P  {'  2  [  I  (\QkXk  iUkRlUk,*XNQNXIII!  (U) 

o  k*0 

d)  Problem 

We  are  required  to  find  a  policy 


uk  -  Ck(Xk)  ;  k»0,...,N-l  ;  ■  {xq  ,x^  , . . .  (12) 

which  minimizes  performance  criterion  (11).  Thus  the  problem  is  identical 
to  the  LQO  problem  except  that  the  performance  criterion  is  the  negative 
of  the  expected  value  of  an  exponential  function  of  a  negative  semi- 


i 

f 


definite  quadratic  form. 
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Note  that  V  (xq)  for  arbitrary  controls  {u^}  is  bounded  as 
follows 

-1  <  v'(x  )  i  0  (13) 

o 

2.2  The  LE +C,  Problem 

The  formulation  is  the  same  as  the  LEPexcent  for  the  performance 
criterion  which  is 

v+(„  >  4  #lx  V  <“> 

o  k*0 

where  V+  :  Rn  -*■  [1,<*>],  and  :  Rn  x  i+  -►  fl,»),  pS  R1”  *  I+  +  Tl,00) 
are  River,  by 

+  IT 

yx(xk;k)  ■  exp{-j  ;  k-0,. . .  ,N  (15) 

y”  (ufc;k)  =*  exp{-|  u^R^}  ;  k-n, . . .  ,N-1  (16) 

with  Qk,  as  in  (9),  (10). 

Note  that  (14)  can  be  written  as 

VX>  ■  sw  m  (17) 

o  k*0 

The  problem  is  to  find  a  nolicv 

°k  "  Ck^k^  '  k  “  0,...,N-1  ;  ^  -  (x^x^, . . .  ,xk>  (18) 

which  minimizes  performance  criterion  (14).  Apain  this  problem  is 
identical  to  the  LOO  problem  except  that  the  performance  criterion 
is  the  expected  value  of  an  exponential  function  of  a  positive  semi- 
definite  quadratic  form. 

Note  that  v'  (xq)  ,  for  arbitrary  controls  {u^},  satisfies 

0  <  V+(x  )  < 
o 


oo 
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3.  Formulation  of  LE~? 

If  no  noise  is  present 


0^=0  ;  k  -  0,...,N-1  (20) 

Minimisation  of  (11)  and  (17)  is  equivalent  to  minimization  of 
N-l 

1  'in  '’‘k’kV'ftcV  +  ’‘nVm'  (21) 

k*0 

subject  to 

Xkfl  *  AkXk+BkUk  5  (22) 

which  is  a  standard  LQP.  Thus  LE  P  and  LE+P  are  eouivalent  and  both 
will  be  referred  to  as  LE”.  As  the  solution  of  the  LQP  is  well  known, 
we  state  it  now  without  proof. 

The  optimal  controller  for  the  LEP(LQP)  is 


where 


u.  *  -D,  x.  ;  k  ■  0,...,N-1 
k  k  k 

\  *  V^ViV'XVA 


and 


-1  T 


\  "  Qk+Akf\+rMk+lBk(Rk+Bk\+iBk)  *B?*fcfi^Ak 


with 


\-qn 


(23) 


(24) 


(25) 


(20 


In  view  of  our  assumptions  (9),  (10)  it  is  easy  to  show  that 


so  that 


M,  >  0  ;  k  - 

K 


(Rk+B^MkJ.1Bk)  >0  ;  k  -  0 . N-l 


(27) 


(28) 


t  Mini  , 
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4.  Solution  of  Discrete  Time  LE~0  Problems 
4.1  The  LE~G  Problem 


We  define 


J  (X.  ;lc)  -  -  min  g.  IT  v“(x.;i>i'(u.;i)y“  (x.,;N) 
^...u^  i-k  x  1  u  1  x  " 

given  that  the  minimizing  optimal  policy  must  be  of  the  form 


“i  *  VV  ;  1  *  k,...,N“] 


At  time  k+1,  then. 


J  (Xk+l;kfl)“  "  nln  ^Iv  n  Mx.;i)if  (u.;i)iT(xM;N)  (31) 
"k+iVi  ^+1  i’k4‘1 


so  that 


where 


j’OC^k)  -  min  [v“(xk;k)y^(uk;k)<?|x^j"(Xk+1;k+:.)l  (32) 


Vi  "  Vk  +  Bk\  +  rkak  ;  xkRiven 


Because  of  the  Markov  property  of  (33)  which  is  due  to  the  independence 
of  {c^}  it  is  clear  from  (29)  that  j"(Xk?k)  can  be  written  as  J  (x^k) 


so  that  (32)  becomes' 


J  (xk;k)  -  min  fux(xk;k)yu(uk;k) I  p(ak;k)J  ;k+l)dak] 

fv  It  i  _co 


:»r 


-  IT 

•T  <VN)  "  "  exT>  (-  j  Wn* 


We  now  show  that 


J  (xk;k)  »  -Fk  exp  {-  j  xkwkxk> 


^Alternatively,  the  development  could  be  continued  using  (32)  ar.d 
identical  results  would  be  obtained. 
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vhich  is  defined  for  V  »  solves  (34)  where 


i  n  :  k  =  o,... ,j: 


(37) 


is  given  bv 

\  -  V^VVi,k(V,kiHiV’1,IVil'k 


where 


and 


A  _  T—  — IT  — 

W  =  w  -  i  r  (v  +rxu  r  }  xn  u 

k+1  k+L  *+1 V  k  k  k+1' k'  kVl 


WN  =  QN  * 


In  addition  we  have  that 


v  =  r 

k  rk+l 


k  k  k-»-l  k 


Ip 


-li 


:  ?N  ‘  1 


and  the  optimal  nolicv  is 


(38) 


(3<>) 


(40) 


(41) 


—  C  v 

“kXk 


where 


(42) 


(43) 


In  order  to  prove  that  (36)  and  (42)  9olve  (34)  we  need  the 

following,  nrobabjv  veil  known  hut  underexnloitpd , 

lemma  1  If  (p, +'/TW  >  r>,  then 

- —  k  k  k-*-l  k 


T  _ — — — -  e 

/  !<yrfc+iV 

=  V  i*;1! 


1  °Iw’  ■  ""  1  ViViVi'S. 

«”  1  (WVl»TVi'W,kV'  (44> 


MSfW *r  ,  *  ,  .  *  A-  ...  ,  *,  ,  .  ^  f  I <*’’<’  ,TI^ 


where  is  defined  in  (39). 


Proof:  See  Appendix. 


Substituting  (36)  into  (34)  and  using  the  Lemma  and  (41)  we  obtain 

-«p<-  |  *XV  '  *1”  -\<Vk)“u<Vk)Ka,{-  \  (*l*k4W1*ii« (Ak V’k"k>  5 


which,  upon  taking  logarithms  is  equivalent  to 

KVk  ■  5  ■IfkVAvW^/v'W'A*'  (16) 

Eouation  (46)  is  satisfied  by  (38),  (42),  (43)  so  that  the  LE  Q  problem 
is  Indeed  solved.  As  in  the  LEP  (LQP)  it  is  easy  to  verifv  that,  under 
assumptions  (4),  (9),  (10),  and  are  positive  semi-definite  for 
k  »  0, . . . ,N  so  that 


vKuV  >  °*  (vB&«v  ’  ° 


which  ensures  that  (38),  (39),  (41),  (43)  are  well  defined. 
4.2  The  LE+C  Problem 


Here  we  define 


J.  a  N-l 

J  1,110  * jY  11  VV1)yu(ui;1)VVN) 

"k***^-!  k  5-k 

given  that  the  minimising  optimal  policy  must  be  of  the  form 

u*  -  C^4)  :  i  -  k . N-l 

so  that  proceeding  as  in  Section  4.1,  we  obtain 

J+(x.;k>  min  ^x(xk;k)yu(uk;k)  f  P<Vk)T+(xk+l :  k+1)dakl 

Wu  J  —CO 
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•T V")  -  “0<1  NVh1 

The  solution  of  (50),  which  is  analogous  to  (30  is 

/(VM  S  r*OT  <{  -XV 

which  is  defined  for  k  *  0,...,N,  where 


(51) 

(52) 

(53) 

(54) 

(55) 

(56) 


+  _+ 

\  ‘  'ck\ 

where 

<  *  ;  k-° . N-1  <57> 

In  order  to  verify  that  ( 5 2)— ( 5 7)  solve  (50)  (which  we  will  not 
do  here  because  the  procedure  is  almost  identical  to  that  for  the  LE  G 
problem)  it  is  necessary  to  use  Lemma  2,  which  we  state  below,  which 


is  useful  onlv  if 
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p.-  5**  ,r.  >  o 

k  k  k+1  k 


0.....N-1 


(58) 


If  (58)  is  not  satisfied,  then  (52)-(57)  do  not  constitute  a  meaningful 
solution  for  (5G)  since  It  follows  from  Lemma  2  that 


J+(x^;k)  is  infinite. 


(59) 


Lemma  2  If 

vKiV  > n-  <“> 

then 


VW  Ip:1  I 


"”'i  ViViVi,d\ 


4 


1  <  vrK+irk> 


-li 


Ip 


-l. 


“,(](WWT,w,WW 1 


(61) 


Moreover,  if 


p  -rTw+  r  i 

k  kkfl  k 


(62) 


then  the  left  hand  side  of  (*1)  is  infinite. 

Proof :  See  Appendix. 

+ 

5.  Properties  of  Solutions  of  Discrete  Time  LE~G  Problems 
5.1  The  LE  G  Problem 

The  optimal  feedback  controller  for  the  LE  G  problem  is  a  linear 
function  of  the  system  state. 


W*iere  depends  upon  the  solution  of  a  Rlccatl  type  difference  equation 
(38).  The  main  difference  between  this  and  the  feedback  law  for  the 
LQS  problem  is  that  depends  upon  P^»  the  covariance  matrix  of 
the  gaussian  additive  disturbance  a^.  In  the  LQG  case  the  optimal  feed¬ 
back  law  is  Independent  of  the  covariance  of  the  input  noise  and,  indeed, 
is  the  same  as  that  for  the  deterministic  LQP  (so  called  Certainty 
Eauivalence  Principle).  Here,  in  the  case  where  our  criterion  is  the 
expected  value  of  minus  an  exponential  function  of  a  negative  semi- 
definite  quadratic  form,  the  Certainty  Equivalence  nrinciole  does 
not  hold. 

It  is  Interesting  to  investigate  two  limiting  cases;  the  first  In 
which  -*■  00  (input  (X^  =  0,  k  *  0,...,N-1)  and  the  second  in 

which  \„in (P^)  (input  "Infinitely  wild"). 

1)  WV*-  =  X-0 . "-!• 

In  this  case  it  is  clear^  from  (36),  (38),  (3°)  that 

C~  -►  Dk  ;  k  -  0,.  ..,N-1  (64) 

the  optimal  gains  for  the  LOP(LEP).  Note,  from  (36)  and  (41)  that 

J  (x^;k)  -*■  -  exp  {-  x^W^x^}  ,  k  *  0,...,N  (65) 

Thus  for  small  noise  intensities  (P^*  small,  k«0,...,N-l)  the  solution 
of  the  LE  C  problem  is  close  to  that  of  the  LEP,  LQP,  and  LQO  problem. 

ii)  X  (pr1)  -  «  ;  k  -  n,...,N-l. 
min  k  _ _ 

Here  we  shall  assume  that 

> 0  '  k  - " . "-1  <6f> 

-L  " 

These  limiting  cases  can  be  argued  rigorously;  the  arguments  are  straight 
forward  and  are  left  to  the  reader. 
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so  that,  from  (37)  -  (39), 

rkWk+irk  >  0  5  k  -  0,...,N-1  (67) 

as  P,  +  0,  then, we  have 
k  t 


W, 


k+1 


VViVrkVi 


-1  T  - 

r  )  r  w 

V  k  k+l 


,N-1 


(ftp) 


and,  from  (36)  and  (41), 


J  (x^:k)  0  ;  k  ■  0, . . .  ,N-1 .  (69) 

Note  that  if  rk  has  rank  n  for  k  ■  0,...,N-1,  that 

W^i  -  0  ;  k-0,...,N-l  (70) 

so  that 

C~  -*0  ;  k  »  0, . , , ,N-1  .  (71) 

An  explanation  for  (71)  is  that  if  all  components  of  are 
disturbed  by  an  "infinitely  wild"  additive  noise  then  there  is  no 
point  (as  far  as  performance  criterion  (6)  is  concerned)  in  exercising 
control  to  try  and  counteract  these  infinite  unpredictable  disturbances 
Of  major  interest  are  the  cases  in  which 

0  <  P"1  <  °»  ;  k  “  0, . . ,  ,N-1  (72) 

for  which  the  new  controller  (42)  offers  an  alternative  to  the  standard 
LOG  solution. 

5.2  The  LE+G  Problem 

As  in  the  LE  G  problem  the  Certainty  Eouivalence  Principle  does 
not  hold  because  C*  depends  upon  the  covariance  of  the  additive  process 
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noise.  We  agiin  consider  the  two  Uniting  cases  of  zero  noi^e  and 
"infinite"  noise. 


4)  Xnin(V  ZZ.  ;  k  "  ° . N~1‘ 

In  this  case,  as  the  covariance  matrix  tends  to  zero,  we  obtain  from 
(52)  -  (57)  that 

c£  ►  Dk  ;  k  -  0, . . .  ,N-1  (73) 

and 

J^x^jk)  -*■  exn  x^x^}  ,  k  -  ^,...,N-1  (74) 

so  that  for  s^all  noise  intensitv  the  solution  of  the  LE+0  problem  is 
close  to  that  of  the  LEP,  LQP,  LQG  problem. 

ii)  Xmin(Pk1)  ?  k  -  0.....N-1. 

For  P^  sufficiently  small  (i.e.  large  covariance)  the  solution 
of  (50)  can  cease  to  exist  (indeed  (48)  can  become  infinite).  To  see 
this,  let  us  assume  that 

rk  Wk > 0  :  k  • 0 . "-1  •  (75) 

and  that 

VrjWj+iri  >  0  ;  j  *  k+l,...,N-l.  (76) 

From  (75),  (76),  (53),  (54)  we  have  that 


T  + 

Tvu.r.  >  o 

k  k+1  k 


(77) 


so  that  for  sufficiently  small 


p.  -  rjw*  r.  i  0 

k  k  k+1  k 


(78) 
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whidi  implies  from  Lana  2,  that  the  left  hand  aide  of  (6G)  Is  infinite. 
Clearly,  then,  from  (50) 


(x^k) 


is  infinite  . 


Since  k  is  arbitrary,  k  £  *0,...,N-l),  we  can  conclude  that  if  the  noise 
covariance  is  sufficiently  large,  the  performance  criterion  (14)  is 
infinite,  regardless  of  the  choice  of  controls  (u^).  We  shall  have  more 
to  say  about  this  interesting  case  when  we  treat  the  continuous  time 
LE+G  problem  in  Section  8. 

+ 

6.  The  Discrete  Time  LE  G  Problems  and  Deterministic  Games 
6.1  The  LE~G  Problem 

The  solution  of  the  LE  G  problem  is,  by  inspection  (or  short 
calculation),  equivalent  to  the  solution  of  the  following  cooperative 
deterministic  game  (LQP) . 


N-l 

Minimize  [-|  J  \  Wh' 

{lV,{cv  k“n 


subject  to  the  dynamic  constraint 


Vl  '  ;  xo  given  (81) 

It  turns  out  that 

N-l 

\  "kVk  ’  f"  '  I  I  (*IQixiiui!!iui+0,IViH  1  Wb1  (*2) 

^Ui',fai  1*k 

Note  that  in  the  above  formulation  we  determine  ontimal  control  laws 


uk  ■  -civ  “v  ■  “Vk  :  k  ■ 0 . S~1 
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We  now  have  a  new  interpretation  for  the  linear-ouardatic  game: 

If  nlayer  assumes  that  nlaver  cx.^  will  cooperate  in  minimizing  the 
quadratic  criterion  (even  though  u^  knows  that  behaves  like  a 
gausslan  random  variable),  then  the  feedback  controller  (policy)  that 
is  obtained  for  u^,  uoon  solving  (8^)  and  (81),  namely 

u^  *  •  k  *  <',*..,N-1  (84) 


is  optimal  also  for  the  LE  C  problem.  Thus  the  policv  for  u^  obtained 
bv  treating  as  a  cooperative  nlaver  makes  sense  when  interpreted 
as  the  solution  of  the  stochastic  LE  ft  problem. 

6.2  Ihe  LE~V.  Problem 

Here,  the  deterministic  game  that  has  an  eauivalent  solution 
is  non- cooperative,  namelv, 

N— 1 

[  t  J0  + 1  ’■Sv*1  (85) 

4-  4- 

subject  to  (81),  where  and  are  determined  as  feedback  laws 
(policies) 

+ 


W  “k  *  -AkXk  :  k  * 


— C,  x,  , 


(86) 


It  Is  well  known  that  if 


Vrk<Hrk>0  :  k-n . "-1 


(87) 


then 


N-l 

2  xkWkXkm/mlwr,a^  ^  2  (xi(5ixi+uiRiui'0tiPiCti)+  iWSl^ 
«u,Ha. )  i«k 


(88) 


i’  i 
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If  the  determinant  of  the  left  hand  side  of  (87)  la  nonzero  but  the 
■atrix  fails  to  be  positive  definite  then  as  is  well  known,  (85)  ceases 
to  be  bounded.  However,  if  the  left  hand  side  of  (87)  is  singular 
for  some  values  of  k  e  (0,...,N-l)  then  (85)  may  exist.  Thus,  provided 


VrK+iriJ *  0  = 


(89) 


we  have,  from  Leuna  2  and  (87),  (88),  that  (48)  is  finite  (for  k  «  0)  if 
and  only  if  (85)  is  finite. 

Out  interpretation  of  the  above  noncooneratlve  deterministic  game 
is  as  follows:  If  plaver  u^  assumes  that  will  not  cooperate  in  minimizing 
the  quadratic  criterion  (even  though  u^  knows  that  behaves  like  a 
gauss ian  random  variable)  then  the  feedback  controller  (policv)  that  is 
obtained  for  u^,  upon  solving  (85) ,  namelv 

Uk  *  ~Ck\  5  <9n> 


is  optimal  for  the  LE+G  problem.  Thus  this  rather  conservative  game 
formulation  in  which  the  noise  Is  treated  as  a  noncooperative  player 
gives  rise  to  a  control  policv  which  solves  the  LE+G  stochastic  control 
problem.  When  looked  at  frcm  this  viewpoint  the  min-max  game  solution  for 
u^  ("worst  case  design")  does  not  appear  to  be  too  pessimistic,  since  the 
performance  criterion  of  the  LE+  G  problem  is  rather  appealing. 


+ 

7.  Formulation  of  Continuous  Time  LE~G  Problems 
7. 1  The  LE  G  Problem 

In  continuous  time,  the  LF  G  problem  takes  the  form 


Minimize 
u(.,.)  o 


exp 


f-f  f  (xTQx+uTRu)dt  +  x(tf)Qfx(tf)l } 


(91) 
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subject  to 

x  -  Ax  +  Bu  r«  :  x(t  )  Riven  (92) 

o 

where,  for  notatlonal  simplicity,  time  dependence  of  the  varid>les 
has  been  suppressed*  and  where  a(.)  Is  a  gaussian  white  noise  process 
having 

<?[n(t)]  -  0  ;  t  £  Fto,tf]  (**3) 

^[a(t)aT(s)l  *  P  *6(t-s)  ;  t,s  e  rtQ,tfl  (94) 

where  <5  is  the  dirac  delta  function. 

Note  that  In  solving  (91)  we  seek  an  optimal  control  policv 

u”(X,t)  =•  C  «'X,t)  ;  t  e  ft  ,tfT  ;  X  -{x(t);T£[t  , t] >  (95) 

or  o 

*■"  1  tn 

where  C  :  9?  *  R  -*■  R  is  a  measurable  function  of  its  arguments. 

7.2  The  LF.~V.  Problem 

Here,  the  performance  criterion  to  be  minimized  is 

g | x  exp{-|  f  f  (xTQx+uTRu)dt  +  xT(tf  )Qfx(tf )  }  (°M 

o  J  t 

o 

and  the  required  control  nolicv  is 

u+(X.t)  -  C+(x,t)  :  t  £  fto, tf  1  (97) 

8.  Solution  of  Continuous  Time  LE  0  Problem  and  Relation  to  Differential 
Carnes 

+ 

8.1  Solution  of  LE  C  Problems 

We  can  solve  thj  continuous  time  LE  C  nroblems  either  bv  formallv 
*Note  that  Q  *  0,  R  >  0,  P  >  n  for  all  t  E  f to , tf 1 ,  and  Qf  >  n. 
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taking  the  limit  of  the  solutions  for  the  discrete  tine  cases  or  bv 
solving  the  ”gei  iralized"  Hamilton- Jacob 1-Bellman  equation  (see 
appendix  for  derivation) 

-  (x,t)  «  min(  a(xTQx+uTRu)J°(x,t)+f  J^(x,t))T(Ax«-Bu) 

+  I  1  :J^x(x,t)rp'1rT))  (9R) 

where 

1  -  :  for  LF  0  problem 

a  -  <  +  (99) 

I  +  ;  for  LE  C  problem 

which  is  satisfied  by 


JC(x,t)  -  ^|x(t)  expfof  \  ftf(xTQx^i<ffRu°)dt*  ^?(tf)Qfx(tf)l)t (100) 


where 


u°(x,t)  *  Ca(x,x)  ;  Te(t,tf] 


(101) 


is  the  optimal  nolicv. 

Using  either  method  we  find  that 


and 


where 


u°(x,t)  -  -R_1BTSax  ;  t  c  [to,tfl 


.T°(x,t)  *  0F°exp(a  x^S°x} 


-S°  -  Q+S0A+ATSa-S0(BR-1BT-orP_1rT)SCT  ;  Sa(t^  -  Qf 


(102) 


(103) 


(104) 


and 


-F°  -  oFCTtr(Sarp"1rT)  ;  Fa(tf)  -  1  . 


(105) 


8.2  Relation  to  Continuous  Time  Differential  Games 


By  inspection  we  see  that  the  optimal  controller  for  the  LE  G 
problem  (o  negative)  is  obtained  from  the  solution  of  the  following 
cooperative  differential  game 

Minimize  f  f  ^  (xTQx+uTRu+aTPa)dt  +  icT(t,)Qfx(tf )  (lOfi) 

u(.),a(.)  't  2  Z  f  f  f 

o 

subject  to 

x  ■  Ax  +  Bu  +  To  ;  x(t  )  given  (107) 

o 

where  we  reauire  the  optimal  controls  in  feedback  (policy)  form 


u  (t)  -  -C  (t)x  ,  a  (t)  -  -A  (t)x(t)  ;  t  c  rtQ,tf]  (108) 


which  results  in 


xTS  (t)x(t)  »  min  [ 


min  [ (  f  \  (xTQx+uTRu+aTPa)dt+  xT(tf )Qfx(cf ) ] 

u(.),a(.)  it  L  * 


(109) 


Because  of  our  assumptions  of  positive  (semi)-def initeness  of  Q,  R,  P 
and  Qj,  it  is  known  that  S  (t)  exists  for  all  t  t  [tQ,tf]  so  that 
(91)  is  well  nosed. 

In  the  case  of  the  LE+G  problem  the  appropriate  differential  game 
is  noncooperative,  namely 


Min  Max  f  ^  4  (xTQx+uTRu-aTPcx)dt+  xT(t^)Q^x(t^.) 
u(.)  a(.)  1 t 


(110) 


subject  to  (107).  The  optimal  feedback  laws  are 


u+(t)  -  -C+(t)x  ,  a+(t)  ■  -A+(t)x  ;  t  c  [tQ,tf]  (111) 


H9BW 


and 

\  xf(t)S+(t)x(t)  «  min  max  [  f  *  \  (xTQx+uTRu-aTi*a)dt 

u(.)  a(.)  U 

+  \  xT(tf)Qfx(tf)]  (112) 

provided  that 

•+  +  +  +  -1  T  -1  T  +  + 

-  S  -  <HS  A+S  A-S  (BR  B  -FP  T  )S  ;  S  (^)  -  Qf  (113) 

has  a  solution  In  [t,t^]. 

Note  that  by  standard  results  on  Riccati  differential  equations, 

(113)  has  a  solution  for  all  t  e  ft  ,t,l  if 

o  r 

(BR*1BT-rp'1rT)  >  o,  t  e  [t  ,tf]  (ii4) 

o  I 

and  so  (114)  guarantees  existence  of  J+(x,t)  ;  t  e  [tQ,t^].  If  (114) 

Is  not  satisfied  (sav  for  *)  sufficiently  large)  then  (113)  may 

exhibit  a  finite  escape  time  (S(t)  -*•  00  for  some  t  E  ft  ,tr])  which  would 

o  t 

-f 

imply  that  (110)  is  unbounded  and  also  that  .1  (x  ;t  )  is  unbounded. 

o  o 

9.  Properties  of  the  Solutions  of  the  Continuous  Time  LE~G  Problems 
9 . 1  The  LE  G  Problem 

As  in  the  discrete  time  case  we  have  that  asP^-^O;  tE  ft  ,t,] 

o  f 

(X  .  (P)  -*■<*;  t  e  rt  ,t,]}the  optimal  controller  tends  to  that  for  the 
min  o  t 

LOG  problem.  As  X  .  (P  ^)  -*■  <*>;  t  e  ft  ,  t,1  problem  (106)  becomes 
min  or 

singular  and  care  must  be  taken  in  studving  the  limit  -  see  f 2 1  for  a 

careful  treatment  of  the  singular  case.  Using  arguments  vew  similar  to 

those  given  in  f2]  it  is  nossible  to  show  that  as  X  (p  +  ®:  t  E  It  ,tr], 

min  o  t 
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the  lisle  of  S  must  exist,  t  £  (tQ,t^].  Now  if  we  make  the  assumption 
that 

f  has  rank  n  ;  t  £  fto,tf)  (115) 

then  from  (104),  (with  a  negative)  and  the  fact  that  the  limit  of  S~: 
t  £  (tQ,t^]  must  exist.  It  follows  that 

Lin  S~  -  0  ;  t  £  (tQ,tf]  (116) 

which  tells  us  that 


R  Vs’  -*  n  ;  t  e  (to>tfl  (117) 

which  Is  analogous  to  the  discrete  time  case  (71). 

9.2  The  LEV,  Problem 

As  ^nin(p)  ♦  °°.  t  £  [tQ,tf]  we  have  that  the  solution  of  the 

LE+G  oroblem,  as  in  the  LE~C  case,  tends  to  the  solution  of  the  t/,,; 

problem.  As  noise  intensity  increases,  A  O’-1)  ■*  ®;  t  E  ft  ,£.], 

min  o  f 

(114)  will  cease  to  be  satisfied,  and  ultimatelv  (113)  will  exhibit 

a  finite  escape  time  signifying  that  J+(x  ,t  )  has  ceased  to  exist: 

o  o 

i.e. ,  for  sufficiently  large  noise  intensity,  performance  criterion 
(96)  is  unbounded.  Note  that  contrary  to  the  LE~G  case,  (117),  we  have 
that 

R-1BTS+  -*•<*>;  t  E  [to,tf1  (11R) 

as 

^  It  .t,]  (119) 
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10.  Some  Stability  Properties  of  Undisturbed  Linear  Svstew  Controlled 

+ 

by  Solution  of  LE  0  Problems 

In  this  section  ve  assume  that  all  narameters  are  time  invariant 
and  we  investigate,  briefly,  stability  of  the  system 

x  •  (A-BC^)x  ;  a  negative  or  positive  .  (120) 

10.1  Stability  Properties  of  C^> 

Here  we  assume  that  the  nair  (A,B)  is  controllable  and  that 
0  >  0.  These  assumptions  guarantee  the  existence  of  S^,  the  uniaue 
positive  definite  steady  state  solution  of  the  Rlccati  eouation.  That 
is,  S  >0  satisfies 

QfsiA+ATs"  -  s“(BT_1BT+rp"1rT)s‘  -  n  (121) 

and  we  have  the  steady  state  feedback  gain 

C"  -  R_1PTS”  (122) 

00  00 

We  now  define 

L  -  4  xTS  x  (123) 

2  00 

which  is  nositive  definite.  Along  trajectories  of  (120),  we  have 

L-  -  xT(s"a+ATs")x-  xTs"BR"1BTS~x  (124) 

2  00  OO  00  00 

which,  upon  using  (121),  i« 

L  -  -  |  xT[Q+S^(BR'1BT-rP"1rT)^]  x  (125) 

Now  if 

BR~1BT-nrlrT  >  n  (126) 

we  have 

L  <  0,  for  all  x  ^  0  (127) 
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and  system  (120),  with  controller  C,,,  is  asvmptotlcallv  stable. 

Note  that  simple  examples  show  that  (120)  can  be  unstable  if 
condition  (126)  is  violated. 

10 .2  Stability  Properties  of  Cm 

In  this  case  ve  assume  condition  (114) ,  namely 

BX  “V  -  rp-1rT  >  o  d28) 

and  also  that  0  >  0.  Note  that  because  of  (128)  ve  can  write 

NNT  -  BR_1BT-rP_1rT  (129) 


If  we  assume  now  that  the  pair  (A,N)  is  controllable  then  it  follows  that 
there  exists  a  unique  oositive-de*£inite  matrix  S+  which  satisfies 

OO 


Q4-S^A+ATst-si(BR"1BT-fP“1rT)S^  -  0 


(130) 


and 


c+  -  r_1bts+ 

CO  OO 


(131) 


Define 


+  A  1  T  + 

L  *  2  x  S®x  • 


(132) 


Along  trajectories  of  (120)  we  have  that 


;+  1  T.+  ,T_+.  T+  -1T.+ 

L  ■  *r  x  (S  A+A  Sm)x-  x  R  BR  B  S  x 

^  OO  OO  OO  GO 


(133) 


which,  upon  using  (130),  is 


L+  -  -  |  xTfQ+S^(BR_1BT+rp“1rT)S^]x 


(134) 


<  0  for  all  x  +  0 


(135) 
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-f  + 

Here,  L  is  a  Liapunov  function  and  (120)  with  controller  is 
asymptotically  stable.  Note  the  interesting  point  that  (120  is 
sufficient  to  guarantee  asymptotic  stability  of  (12n)  with  controllers 

—  *4* 

Cm  or  C«»*  In  the  first  case,  (126)  is  used  to  guarantee  negativity  of 

*—  + 

L  while  in  the  second  it  is  used  to  guarantee  existence  of  S^. 


11.  Interpretation  of  Stability  Results  in  Terms  of  Infinite  Time 
+ 

LE~G  Problems 

Clearly,  from  (103),  (105) 


J  (x,t)  •»  0  as  t  -*• 


(136) 


and 

.T+(x,t)  +  ®>s  [♦-"  (137) 

♦ 

In  order  for  LE  0  problems  to  make  sense,  therefore,  we  define 
our  infinite  time  criterion  as 


1 

r  1  T  T  T  t) 

Lim  |oHi:  ^|x(t)oexPfaf  "2  j  (x  0x4u  Ru)dt+x  (tf )Qfx(tf)  ]>j  (138) 

r 

Note  that  from  (103),  (105)  (138)  is  equal  to 

\  exp{|tr(S^rp"1rT)}.  (139) 

i 

l 

In  the  case  where  o  is  negative  and  the  noise  intensity  is  large 
an  unstable  control  law  may  be  optimal,  because  in  (138)  the  ouantitv  whose 
expected  value  is  calculated  is  bounded  below  hv  minus  one  and  above  bv 
zero  regardless  of  the  control  that  is  annlied. 

Note  that  when  o  is  oositive  an  unstable  control  law  cannot  be 
optimal  because  the  Quantity  whose  expected  value  is  calculated  is 
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unbounded;  this  is  confirmed  bv  (135)  which  indicates  that  if  an  optimal 
controller  exists  for  the  infinite  tiae  LE+G  problem  it  must  be  stable. 

12.  Conclusion 

In  this  paper  we  have  presented  explicit  (modulo  solution  of  Riccati 
difference  or  differential  equations)  solutions  of  stochastic  control 
problems  having  linear  dynamics,  additive  gaussian  noise  and  exponential 
objective  functions.  These  solutions  are  linear  feedback  control  policies 
which  depend  upon  the  covariance  matrix  of  the  additive  process  noise 
so  that  the  Certaintv  Equivalence  Principle  of  LOG  theorv  does  not  hold. 

In  certain  applications  these  new  controllers  may  be  preferable,  esoedallv 
perhaps  in  economics  where  multiplicative  objective  functions  are  of 
intrinsic  interest. 

Bv  demonstrating  certain  equivalences  between  our  stochastic  control 
formulations  and  deterministic  differential  games  we  are  able  to  give 
a  stochastic  interpretation  to  min-max  ("worst  case")  desien  of  linear 
systems.  This  suggests  that  the  "pessimistic"  min-max  design  is  not 
unattractive  since  it  corresponds,  in  a  stochastic  setting,  to  minimization 
of  the  expected  value  of  an  exponential  function  of  a  ouadratic  form, 
which  is  quite  an  appealing  criterion.  Another  significant  result  of  these 
equivalences  is  that  existence  of  solutions  of  the  stochastic  control 
problems  implies  and  is  implied  by  existence  of  solutions  of  the  differential 
games.  Hopefully  these  notions  can  be  extended  to  provide  existence  results 
for  nonlinear  stochastic  control  problems  and  nonlinear  differential  games. 


state  solutions  of 


Certai-:  stability  properties  of  the  steadv 
the  stochastic  control  problem  are  also  investigated.  In  particular, 
we  ooint  out  that  the  steady  state  controller  for  the  LE_G  problem 
can  result  in  an  unstable  dynamic  system  while  the  steadv  state  controller 
for  the  LE  G  nrobleo,  if  it  exists,  always  stabilizes  the  dynamic 
system.  In  this  sense,  the  L£+G  formulation  is  preferable. 

■ote  t**at  we  her*  not  considered  in  this  newer  the  more  complex 
preblee*  in  which  noisy  ■easurwent*  ef  the  state  are  made,  viz., 
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where  {6^,0^, xq}  are  independent  gaussian  random  variables.  In  this 
cast  the  optimal  controls  are  restricted  to  be  of  the  form 

\  -  cfo)  ;  k  *  0, . . .  ,N-1,  (141) 

where  is  -  or  +  and  where 
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The  appropriate  performance  criterion  is 

N~1 
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The  above  problem  appears  be  intrinsically  much  harder  than 
the  perfect  state  case,  and  could  be  the  topic  of  a  future  paper. 

References 

1.  M.  Athans,  'The  Role  and  Use  of  the  Stochastic  Linear-Ouadretic- 
Gaussian  Problem  in  Contro]  System  Design,"  IEEE  Trans.  Automatic 
Control,  Special  Issue  on  Linear-Ouadratic-Caussian  Problem, 

Vol.  16,  Dec.  1971,  Dp.  529-552. 

2.  D.H.  Jacobson  and  J.L.  Spever,  "Nee. asary  and  Sufficient  Conditions 
for  Optimality  for  Singular  Contr.  a  Problems:  A  Limit  Approach," 

J.  Math.  Anal.  Appl.  34.  1971,  pp.  239-265. 


-28- 


13.  Acknowledgements 

I  wish  to  thank  Lotfi  Zadeh  for  stimulating  discussions,  during 
the  Spring  of  1971,  on  fuzzy  set  theory.  These  discussions  contributed 
to  the  development  of  certain  results  in  this  paper.  Critical  comments 
from  David  Mayne,  Larry  Ho  and  Jason  Sperer  are  appreciated. 


Ai pend lx 

Ltrna  1  If  >  ",  then 


iff 


m  '  ac>!~  1  VlVl Vl  'dalc 


r'p  r  '  ^ 


1 


i 


V-VWiL.  m  ...  i 


A.l) 


'tT1' 
•  k 


where  V~  is  defined  in  (39). 
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Proof:  The  left  hand  side  of  (A.l)  is,  using  (1),  eoual  to 
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is  proved  by  (A. 2)  since  the  integrand  is  a  probability  densitv 


function  having  ®ean  and  covariance 
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where  W  is  defined  in  (54) . 
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then  the  left  hand  side  of  (A. 5)  is  infinite. 

Proof:  i)  The  proof  Is  the  sane  as  that  of  Lersna  1  with  replaced 
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ii)  We  have  that 
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and  we  note  that  because  of  (A. 6)  there  exists  a  direction  a*  such  that 
the  right  hand  side  of  (A. 7)  does  not  go  to  zero  as  ! ’a*' 1  -  Clearlv 
this  inplies  divergence  of  the  integral  on  the  left  hand  side  of  (A. 5). 


which  is  equation  (98). 


